2,995 research outputs found
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Drawing an inspiration from behavioral studies of human decision making, we
propose here a general parametric framework for multi-armed bandit problem,
which extends the standard Thompson Sampling approach to incorporate reward
processing biases associated with several neurological and psychiatric
conditions, including Parkinson's and Alzheimer's diseases,
attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain.
We demonstrate empirically that the proposed parametric approach can often
outperform the baseline Thompson Sampling on a variety of datasets. Moreover,
from the behavioral modeling perspective, our parametric framework can be
viewed as a first step towards a unifying computational model capturing reward
processing abnormalities across multiple mental conditions.Comment: Conference on Artificial General Intelligence, AGI-1
Concurrent bandits and cognitive radio networks
We consider the problem of multiple users targeting the arms of a single
multi-armed stochastic bandit. The motivation for this problem comes from
cognitive radio networks, where selfish users need to coexist without any side
communication between them, implicit cooperation or common control. Even the
number of users may be unknown and can vary as users join or leave the network.
We propose an algorithm that combines an -greedy learning rule with a
collision avoidance mechanism. We analyze its regret with respect to the
system-wide optimum and show that sub-linear regret can be obtained in this
setting. Experiments show dramatic improvement compared to other algorithms for
this setting
ELAN as flexible annotation framework for sound and image processing detectors
Annotation of digital recordings in humanities research still is, to a largeextend, a process that is performed manually. This paper describes the firstpattern recognition based software components developed in the AVATecH projectand their integration in the annotation tool ELAN. AVATecH (AdvancingVideo/Audio Technology in Humanities Research) is a project that involves twoMax Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen,Max Planck Institute for Social Anthropology, Halle) and two FraunhoferInstitutes (Fraunhofer-Institut fĂŒr Intelligente Analyse- undInformationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute,Berlin) and that aims to develop and implement audio and video technology forsemi-automatic annotation of heterogeneous media collections as they occur inmultimedia based research. The highly diverse nature of the digital recordingsstored in the archives of both Max Planck Institutes, poses a huge challenge tomost of the existing pattern recognition solutions and is a motivation to makesuch technology available to researchers in the humanities
Hi-Val: Iterative Learning of Hierarchical Value Functions for Policy Generation
Task decomposition is effective in manifold applications where the global complexity of a problem makes planning and decision-making too demanding. This is true, for example, in high-dimensional robotics domains, where (1) unpredictabilities and modeling limitations typically prevent the manual specification of robust behaviors, and (2) learning an action policy is challenging due to the curse of dimensionality. In this work, we borrow the concept of Hierarchical Task Networks (HTNs) to decompose the learning procedure, and we exploit Upper Confidence Tree (UCT) search to introduce HOP, a novel iterative algorithm for hierarchical optimistic planning with learned value functions. To obtain better generalization and generate policies, HOP simultaneously learns and uses action values. These are used to formalize constraints within the search space and to reduce the dimensionality of the problem. We evaluate our algorithm both on a fetching task using a simulated 7-DOF KUKA light weight arm and, on a pick and delivery task with a Pioneer robot
School adjustment of ethnic minority youth: A qualitative and quantitative research synthesis of family-related risk and resource factors
In todayâs multicultural societies, the question of how school adjustment (adapting to the role of being a student) can be promoted for students from ethnic minority backgrounds is of high importance. The ecological approach to acculturation research proposes minority studentsâ school adjustment is shaped by the surrounding context, and it suggests that the microsystem family plays an important role. Specifically, parentsâ acculturation, practices, attitudes, and background have been identified as key factors. While there exist systematic reviews of the impact of parental factors more broadly, some of which researched ethnic minorities, a comprehensive literature review of family-related factors that affect ethnic minority youthâs school adjustment is missing. The present study provides a synthesis of qualitative and quantitative empirical research of interest, including 60 qualitative and 46 quantitative studies. Its content analysis portrays in what ways parental acculturation, practices, attitudes and background can support or hamper school adjustment among ethnic minority youth. A subsequent meta-analysis quantifies the strength of the impact of these parental variables on the school adjustment of their children. Our findings show that parental practices have the most crucial impact on the psychological well-being, academic self-esteem and aspirations, behaviour and achievement outcomes of minority youth
Bandit Online Optimization Over the Permutahedron
The permutahedron is the convex polytope with vertex set consisting of the
vectors for all permutations (bijections) over
. We study a bandit game in which, at each step , an
adversary chooses a hidden weight weight vector , a player chooses a
vertex of the permutahedron and suffers an observed loss of
.
A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a
regret of for a time horizon of . Unfortunately,
CombBand requires at each step an -by- matrix permanent approximation to
within improved accuracy as grows, resulting in a total running time that
is super linear in , making it impractical for large time horizons.
We provide an algorithm of regret with total time
complexity . The ideas are a combination of CombBand and a recent
algorithm by Ailon (2013) for online optimization over the permutahedron in the
full information setting. The technical core is a bound on the variance of the
Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by
establishing positive semi-definiteness of a family of 3-by-3 matrices
generated from rational functions of exponentials of 3 parameters
Boosting parallel perceptrons for label noise reduction in classification problems
The final publication is available at Springer via http://dx.doi.org/10.1007/11499305_60Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005, Las Palmas, Canary Islands, Spain, June 15-18, 2005Boosting combines an ensemble of weak learners to construct a new weighted classifier that is often more accurate than any of its components. The construction of such learners, whose training sets depend on the performance of the previous members of the ensemble, is carried out by successively focusing on those patterns harder to classify. This fact deteriorates boostingâs results when dealing with malicious noise as, for instance, mislabeled training examples. In order to detect and avoid those noisy examples during the learning process, we propose the use of Parallel Perceptrons. Among other things, these novel machines allow to naturally define margins for hidden unit activations. We shall use these margins to detect which patterns may have an incorrect label and also which are safe, in the sense of being well represented in the training sample by many other similar patterns. As candidates for being noisy examples we shall reduce the weights of the former ones, and as a support for the overall detection procedure we shall augment the weights of the latter ones.With partial support of Spainâs CICyT, TIC 01â572, TIN 2004â0767
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling
Greedy heuristics may be attuned by looking ahead for each possible choice,
in an approach called the rollout or Pilot method. These methods may be seen as
meta-heuristics that can enhance (any) heuristic solution, by repetitively
modifying a master solution: similarly to what is done in game tree search,
better choices are identified using lookahead, based on solutions obtained by
repeatedly using a greedy heuristic. This paper first illustrates how the Pilot
method improves upon some simple well known dispatch heuristics for the
job-shop scheduling problem. The Pilot method is then shown to be a special
case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the
Pilot method, MCTS methods use random completion of partial solutions to
identify promising branches of the tree. The Pilot method and a simple version
of MCTS, using the -greedy exploration paradigms, are then
compared within the same framework, consisting of 300 scheduling problems of
varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS
reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012
Braess's Paradox in Wireless Networks: The Danger of Improved Technology
When comparing new wireless technologies, it is common to consider the effect
that they have on the capacity of the network (defined as the maximum number of
simultaneously satisfiable links). For example, it has been shown that giving
receivers the ability to do interference cancellation, or allowing transmitters
to use power control, never decreases the capacity and can in certain cases
increase it by , where is the
ratio of the longest link length to the smallest transmitter-receiver distance
and is the maximum transmission power. But there is no reason to
expect the optimal capacity to be realized in practice, particularly since
maximizing the capacity is known to be NP-hard. In reality, we would expect
links to behave as self-interested agents, and thus when introducing a new
technology it makes more sense to compare the values reached at game-theoretic
equilibria than the optimum values.
In this paper we initiate this line of work by comparing various notions of
equilibria (particularly Nash equilibria and no-regret behavior) when using a
supposedly "better" technology. We show a version of Braess's Paradox for all
of them: in certain networks, upgrading technology can actually make the
equilibria \emph{worse}, despite an increase in the capacity. We construct
instances where this decrease is a constant factor for power control,
interference cancellation, and improvements in the SINR threshold (),
and is when power control is combined with interference
cancellation. However, we show that these examples are basically tight: the
decrease is at most O(1) for power control, interference cancellation, and
improved , and is at most when power control is
combined with interference cancellation
- âŠ